A cleaner/more streamlined version of rotation_exploration.Rmd for assessing changes in clinical scores pre-post 7T glucest.
load packages
Find the first 7T each participant is done to maximize length of potential follow-ups. Using glucest subject list extracted from .xls (list from David 1/13/22).
Note: doesn’t include scans past June 2019. Will need to update but probably won’t add many subj. with 1 yr post-7T clinical timepoints. Look at slack message from Arianna with subj she’s scanned and their latest clin assess to see if already included
## 'data.frame': 134 obs. of 3 variables:
## $ BBLID : int 12882 15305 15598 16422 16962 17491 17519 17523 17621 17622 ...
## $ X7T_date : chr "" "" "" "" ...
## $ ONM_7T_date: chr "5/22/14" "2/20/14" "8/6/14" "11/25/14" ...
## 'data.frame': 134 obs. of 3 variables:
## $ BBLID : int 12882 15305 15598 16422 16962 17491 17519 17523 17621 17622 ...
## $ X7T_date : Date, format: NA NA ...
## $ ONM_7T_date: Date, format: "2014-05-22" "2014-02-20" ...
Notes * HOW TO ID 22Q PPTS FROM PNC FOR REMOVAL?!?! * no date available for PNC dx, arbitrarily picking Jan 1 2000
##
## FALSE
## 269
## [1] 132
There are 132 (132) subjects who have 7T scans, excluding 22q pts ID’d from Kosha’s RedCap data
Won’t be the focus of our analyses but will be useful for grouping. Merging diagnoses from PNC and Kosha’s RedCap pull:
#first clean redcap dx source info
names(redcap)
## [1] "BBLID" "DODIAGNOSIS" "HSTATUS"
## [4] "AXIS1_DX1" "allaxis1desc" "GASR_CURRENT"
## [7] "AGEONSET_AXIS1" "AGEONSET_CLINICRISK" "dx_pscat"
## [10] "timepoint" "FROM_CAPA" "FROM_DIGS"
## [13] "FROM_SCID" "FROM_FIGS" "FROM_RECORDS"
## [16] "FROM_MINI" "FROM_SIPS"
sources <- c("FROM_CAPA", "FROM_DIGS", "FROM_SCID", "FROM_FIGS", "FROM_RECORDS", "FROM_MINI", "FROM_SIPS")
redcap[,sources] <- redcap[,sources] %>%
transmute_all(funs(ifelse(. == 1, deparse(substitute(.)), NA))) #replace coding with col name
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
redcap <- redcap %>% unite(dx_source, c(sources), na.rm = TRUE, remove=FALSE) #merge across - not pretty but gets the point across for now
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(sources)` instead of `sources` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
redcap_dx <- redcap %>%
rename(bblid=BBLID, dx_date=DODIAGNOSIS, dx=dx_pscat) %>%
mutate(dx_date=as.Date(dx_date)) %>%
select (bblid, dx, dx_source, dx_date)
all_dx <- rbind(redcap_dx,goassess)%>%
subset(!(bblid %in% q22$BBLID)) #drop 22q sub
id_date <- all_dx %>% select(bblid, dx_date)
table(duplicated(id_date))#no duplicate rows, all ID and dx_date combos unique!
##
## FALSE
## 9781
Now combine dx with scan dates:
dx_scan_date <- right_join(all_dx, first7t, by = c("bblid" = "BBLID")) %>%
mutate(bblid = as.factor(bblid))
length(unique(dx_scan_date$bblid))
## [1] 132
sum(is.na(dx_scan_date$dx)) #some subjects with scans have no diagnosis from Kosha's redcap or goassess
## [1] 24
dx_missing <- dx_scan_date %>% subset(is.na(dx), select = c(bblid, first_7t_date))
# write.csv(dx_missing, "missing_dx.csv", row.names=FALSE)
There are 24 (24) subj who have 7T scans but are missing diagnoses from redcap/PNC!
Seem to have some data for these ppts in spreadsheets David added/demographics from 7T list - reviewed with David
Proceeding for now with dx available, recoding factors for consistency between PNC and RedCap when possible:
table(dx_scan_date$dx)
##
## noDSMdx OP other pro PS psy TD
## 105 22 46 98 38 34 15
dx_scan_date <- dx_scan_date %>% mutate(dx = as.factor(dx))
dx_scan_date$dx <- fct_collapse(dx_scan_date$dx, "healthy" = c("noDSMdx", "TD"),
"other_dx" = c("OP", "other"),
"psych_spect" = c("psy", "PS", "pro")) #recode factors
#make new df with only baseline dx (closest to 7T)
#calc diff (negative values -> dx before 7t)
dx_scan_date$dx_diff = as.numeric(difftime(dx_scan_date$dx_date, dx_scan_date$first_7t_date, units = "days"))
#select one dx per subject with date closest to &t
dx_baseline <- dx_scan_date %>%
mutate(abs_diff_in_days=abs(dx_diff)) %>%
group_by(bblid) %>%
arrange(abs_diff_in_days) %>%
slice(c(1)) %>%
ungroup() %>%
select(!c(abs_diff_in_days))
summary(dx_baseline$dx_diff) #not great alignment between date of dx and date of scan
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -6957.0 -232.8 -99.5 -240.9 -21.0 839.0 24
ggplot(dx_baseline, aes(x=dx_diff, y=bblid, color=dx)) +
geom_point() +
labs(title="Dx Relative to First 7T Scan", x="Days from 7T Scan") +
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
## Warning: Removed 24 rows containing missing values (geom_point).
Adding in demographics - will have dx from 7T list in separate column for now, since I’m not sure when/where they came from
#recode & rename variables for clarity
demo <- demo %>% select(-Sex_1M, -Ethnicity, -Age)
demo$sex <- recode(demo$sex, "1" = "M", "2" = "F")
#merge with first 7t list
dx_baseline <- left_join(dx_baseline, demo, by = c("bblid"="BBLID"))
#calculate age at first 7t
dx_baseline$age_scan = as.period(interval(start = dx_baseline$DOB, end = dx_baseline$first_7t_date))$year
dx_baseline <- dx_baseline %>% select(-DOB) #drop DOB since we don't need that PHI anymore
Some subjects have multiple assessments on one day (one from proband one from family, etc); need to be sure to pick only TYPE == "COMBINED" per discussion with David. If no combined, pick proband.
ADD SIPS DATA FROM GOASSESS!
#SIPS + first 7t and dx
sips_date <- inner_join(sips, dx_baseline, by = "bblid")
str(sips_date$bblid)
## Factor w/ 134 levels "14528","15305",..: 2 3 3 4 5 5 5 6 7 7 ...
#calc diff (negative values -> dx before 7t)
sips_date$diff_in_days = as.numeric(difftime(sips_date$DOSIPS, sips_date$first_7t_date, units = "days"))
# drop data more than 1 yr prior to scan
sips_date <- subset(sips_date, diff_in_days >= -365)
str(sips_date$bblid) #no subj lost
## Factor w/ 134 levels "14528","15305",..: 2 3 5 7 7 10 10 11 14 15 ...
min(sips_date$diff_in_days) #confirm
## [1] -354
#find subjects that have a baseline sips and sops up to 364 days post-7T
clin364_sip <- sips_date %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
#find subjects that have a followup sips sops >365 days post-7T
clin365_sip <- sips_date %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
#filter df on date
sips.sops_scan <- subset(sips_date, (bblid %in% clin365_sip$bblid & bblid %in% clin364_sip$bblid))
length(unique(sips.sops_scan$bblid))
## [1] 27
#if subj has more than one assessment for same day, select Combined
sips.sops_scan <- sips.sops_scan %>% mutate(TYPE = as.factor(TYPE))
levels(sips.sops_scan$TYPE)
## [1] "Collateral" "Combined" "FP" "IP" "P"
## [6] "proband" "Proband"
sips.sops_scan$TYPE <- fct_collapse(sips.sops_scan$TYPE, "Proband" = c("Proband", "proband", "P", "FP", "IP")) #combine all proband
sips.sops_scan$TYPE <- factor(sips.sops_scan$TYPE, levels = c("Combined", "Proband", "Collateral"))
#group data to select correct type (defined as lowest ordered level)
sips.sops_scan.filt <- sips.sops_scan %>%
group_by(bblid, DOSIPS) %>%
arrange(TYPE) %>%
slice(c(1)) %>%
ungroup()
length(unique(sips.sops_scan.filt$bblid)) #confirm no subjects deleted
## [1] 27
skimr::skim(sips.sops_scan.filt)
| Name | sips.sops_scan.filt |
| Number of rows | 76 |
| Number of columns | 57 |
| _______________________ | |
| Column type frequency: | |
| character | 13 |
| Date | 3 |
| factor | 7 |
| numeric | 34 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| date_7T | 0 | 1.00 | 10 | 10 | 0 | 27 | 0 |
| PROTOCOL | 0 | 1.00 | 12 | 23 | 0 | 10 | 0 |
| RATER | 0 | 1.00 | 4 | 8 | 0 | 20 | 0 |
| SPD | 13 | 0.83 | 1 | 1 | 0 | 1 | 0 |
| FHPI | 11 | 0.86 | 1 | 1 | 0 | 2 | 0 |
| PRODROMAL | 23 | 0.70 | 1 | 1 | 0 | 2 | 0 |
| CONSIDER | 23 | 0.70 | 1 | 1 | 0 | 2 | 0 |
| REVIEW | 23 | 0.70 | 1 | 1 | 0 | 1 | 0 |
| FIRST_DEG_SCZ | 17 | 0.78 | 1 | 1 | 0 | 2 | 0 |
| SIPS_SOURCEID | 72 | 0.05 | 7 | 7 | 0 | 4 | 0 |
| SOURCE_PROJECT | 72 | 0.05 | 3 | 3 | 0 | 1 | 0 |
| type2 | 0 | 1.00 | 7 | 10 | 0 | 4 | 0 |
| dx_source | 0 | 1.00 | 9 | 39 | 0 | 5 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| DOSIPS | 0 | 1 | 2013-01-10 | 2021-10-06 | 2016-04-20 | 75 |
| dx_date | 0 | 1 | 2013-11-14 | 2018-09-21 | 2015-01-30 | 27 |
| first_7t_date | 0 | 1 | 2013-10-11 | 2018-11-30 | 2014-10-11 | 25 |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| bblid | 0 | 1 | FALSE | 27 | 109: 7, 999: 5, 105: 5, 820: 4 |
| TYPE | 0 | 1 | FALSE | 3 | Pro: 63, Com: 12, Col: 1 |
| dx | 0 | 1 | FALSE | 3 | psy: 51, hea: 16, oth: 9 |
| Diagnostic_Group | 0 | 1 | FALSE | 4 | PRO: 49, TD/: 16, Oth: 6, PSY: 5 |
| Race | 0 | 1 | FALSE | 4 | Bla: 59, Whi: 8, Mix: 5, Asi: 4 |
| sex | 0 | 1 | FALSE | 2 | F: 44, M: 32 |
| ethnicity | 0 | 1 | FALSE | 2 | 2: 73, 1: 3 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| VISITNUM | 4 | 0.95 | 1.07 | 0.31 | 1.0 | 1.00 | 1.0 | 1.00 | 3.0 | ▇▁▁▁▁ |
| P1 | 1 | 0.99 | 2.00 | 1.93 | 0.0 | 0.00 | 2.0 | 3.00 | 6.0 | ▇▅▂▂▃ |
| P2 | 1 | 0.99 | 1.44 | 1.60 | 0.0 | 0.00 | 1.0 | 3.00 | 6.0 | ▇▂▂▁▁ |
| P3 | 1 | 0.99 | 1.00 | 1.57 | 0.0 | 0.00 | 0.0 | 2.00 | 6.0 | ▇▁▂▁▁ |
| P4 | 1 | 0.99 | 1.52 | 1.93 | 0.0 | 0.00 | 0.0 | 3.00 | 6.0 | ▇▁▂▁▂ |
| P5 | 1 | 0.99 | 1.64 | 1.63 | 0.0 | 0.00 | 2.0 | 3.00 | 6.0 | ▇▃▂▂▁ |
| N1 | 1 | 0.99 | 1.13 | 1.41 | 0.0 | 0.00 | 1.0 | 2.00 | 5.0 | ▇▃▁▁▁ |
| N2 | 1 | 0.99 | 0.64 | 1.06 | 0.0 | 0.00 | 0.0 | 1.00 | 4.0 | ▇▂▁▁▁ |
| N3 | 1 | 0.99 | 0.81 | 1.14 | 0.0 | 0.00 | 0.0 | 1.00 | 4.0 | ▇▂▂▁▁ |
| N4 | 1 | 0.99 | 0.49 | 1.14 | 0.0 | 0.00 | 0.0 | 0.00 | 5.0 | ▇▁▁▁▁ |
| N5 | 1 | 0.99 | 0.79 | 1.13 | 0.0 | 0.00 | 0.0 | 1.00 | 4.0 | ▇▂▂▁▁ |
| N6 | 1 | 0.99 | 1.44 | 2.21 | 0.0 | 0.00 | 0.0 | 3.00 | 6.0 | ▇▁▁▁▂ |
| D1 | 1 | 0.99 | 0.60 | 1.10 | 0.0 | 0.00 | 0.0 | 1.00 | 5.0 | ▇▁▁▁▁ |
| D2 | 1 | 0.99 | 0.87 | 1.52 | 0.0 | 0.00 | 0.0 | 1.00 | 6.0 | ▇▁▁▁▁ |
| D3 | 2 | 0.97 | 0.91 | 1.25 | 0.0 | 0.00 | 0.0 | 2.00 | 4.0 | ▇▂▂▂▁ |
| D4 | 1 | 0.99 | 0.37 | 0.82 | 0.0 | 0.00 | 0.0 | 0.00 | 4.0 | ▇▁▁▁▁ |
| G1 | 1 | 0.99 | 1.08 | 1.62 | 0.0 | 0.00 | 0.0 | 2.00 | 6.0 | ▇▁▂▁▁ |
| G2 | 1 | 0.99 | 1.05 | 1.55 | 0.0 | 0.00 | 0.0 | 2.00 | 6.0 | ▇▂▁▁▁ |
| G3 | 1 | 0.99 | 0.00 | 0.00 | 0.0 | 0.00 | 0.0 | 0.00 | 0.0 | ▁▁▇▁▁ |
| G4 | 1 | 0.99 | 0.79 | 1.27 | 0.0 | 0.00 | 0.0 | 1.00 | 6.0 | ▇▁▁▁▁ |
| GAF_C | 10 | 0.87 | 64.76 | 18.01 | 21.0 | 51.00 | 65.0 | 80.75 | 95.0 | ▁▆▇▇▇ |
| GAF_H | 15 | 0.80 | 67.59 | 16.64 | 21.0 | 53.00 | 70.0 | 81.00 | 95.0 | ▁▅▆▇▇ |
| POS_345 | 23 | 0.70 | 1.32 | 1.49 | 0.0 | 0.00 | 1.0 | 2.00 | 5.0 | ▇▃▁▁▁ |
| POS_6 | 23 | 0.70 | 0.19 | 0.74 | 0.0 | 0.00 | 0.0 | 0.00 | 4.0 | ▇▁▁▁▁ |
| NEG_3456 | 23 | 0.70 | 0.77 | 0.97 | 0.0 | 0.00 | 1.0 | 1.00 | 4.0 | ▇▆▂▁▁ |
| DIS_3456 | 23 | 0.70 | 0.53 | 0.80 | 0.0 | 0.00 | 0.0 | 1.00 | 3.0 | ▇▃▁▁▁ |
| GEN_3456 | 23 | 0.70 | 0.47 | 0.80 | 0.0 | 0.00 | 0.0 | 1.00 | 3.0 | ▇▂▁▁▁ |
| GAF_PCTCHG | 14 | 0.82 | -0.60 | 3.50 | -25.0 | 0.00 | 0.0 | 0.00 | 0.0 | ▁▁▁▁▇ |
| POSS_PRO_DX | 48 | 0.37 | 348.34 | 0.12 | 348.1 | 348.30 | 348.4 | 348.40 | 348.4 | ▂▁▁▁▇ |
| timepoint | 0 | 1.00 | 5.36 | 3.77 | 1.0 | 3.00 | 4.5 | 6.25 | 19.0 | ▇▆▁▁▁ |
| ntimepoints | 0 | 1.00 | 7.03 | 4.60 | 2.0 | 4.00 | 6.0 | 7.25 | 19.0 | ▇▅▁▁▂ |
| dx_diff | 0 | 1.00 | 14.24 | 146.00 | -259.0 | -71.00 | -39.5 | 192.00 | 269.0 | ▁▇▃▁▆ |
| age_scan | 0 | 1.00 | 19.50 | 3.17 | 15.0 | 16.00 | 19.0 | 22.00 | 26.0 | ▇▃▅▃▂ |
| diff_in_days | 0 | 1.00 | 574.83 | 777.89 | -304.0 | -55.25 | 392.5 | 958.75 | 2677.0 | ▇▃▃▁▁ |
SOPS seems to be the 12 item components of SIPS (which are overall ratings of severity). Focusing on SOPS (summing within and subscales Pos, Neg, Gen, Disorg) - seem to be valid scoring based on ct.gov.
#filter for SOPS completion
sops_scan <- sips.sops_scan.filt %>%
drop_na(P1) %>%
select(c(1, 7:26, 47:57)) #subset columns
#since dropping NA we need to refilter pre-post subj
##find subjects that have a baseline sops up to 364 days post-7T
clin364_sop <- sops_scan %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
##find subjects that have a followup sips sops >365 days post-7T
clin365_sop <- sops_scan %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
##filter df
sops_scan <- subset(sops_scan, (bblid %in% clin365_sop$bblid & bblid %in% clin364_sop$bblid))
sum(is.na(sops_scan[3:21]))#see missing sops scores
## [1] 1
sops_scan <- sops_scan %>% mutate(sops_p = rowSums(across(3:7), na.rm=F),
sops_n = rowSums(across(8:13), na.rm=F),
sops_d = rowSums(across(14:17), na.rm=F),
sops_g = rowSums(across(c(G1, G2, G3, G4)), na.rm=F), #index not working for some unknown and horribly frustrating reason
sops_tot = rowSums(across(c(sops_p, sops_n, sops_d, sops_g), na.rm=F)))
sum(is.na(sops_scan$sops_tot))
## [1] 1
#plot for fun
ggplot(sops_scan, aes(x=diff_in_days, y=bblid, color=sops_tot)) +
geom_point() +
geom_vline(xintercept=365) +
labs(title="SOPS Relative to First 7T Scan", x="Days from 7T Scan") +
theme(legend.position = "none")
length(unique(sops_scan$bblid))
## [1] 27
sops_tot <- sops_scan %>% drop_na(sops_tot)
length(unique(sops_tot$bblid))
## [1] 27
Dropping subjects with missing items for now, could be used for subscale scores
Looking just at change immediately pre-post 7T:
There are 27 (27) subjects with SOPS at baseline and at least 1 post-1 yr followup, though total pre-post scores are available only for 26 subjects. Negative change -> ppt’s score is getting lower (i.e. symptoms improving)
## bblid dx sops_tot_pre sops_tot_post
## 17622 : 1 healthy : 7 Min. : 0.00 Min. : 0.00
## 18093 : 1 other_dx : 2 1st Qu.: 4.25 1st Qu.: 7.00
## 82051 : 1 psych_spect:17 Median :22.50 Median :16.00
## 83835 : 1 Mean :21.65 Mean :18.92
## 85369 : 1 3rd Qu.:31.75 3rd Qu.:31.25
## 87225 : 1 Max. :74.00 Max. :57.00
## (Other):20
## change_pos change_neg change_dis change_gen
## Min. :-13.0000 Min. :-17.000 Min. :-12.0000 Min. :-7.0000
## 1st Qu.: -4.5000 1st Qu.: -2.750 1st Qu.: -3.0000 1st Qu.:-1.0000
## Median : 0.0000 Median : -1.000 Median : -0.5000 Median : 0.0000
## Mean : -0.2692 Mean : -1.615 Mean : -0.9615 Mean : 0.1154
## 3rd Qu.: 3.2500 3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 0.7500
## Max. : 15.0000 Max. : 7.000 Max. : 7.0000 Max. : 8.0000
##
## change_tot
## Min. :-45.000
## 1st Qu.: -8.000
## Median : -1.500
## Mean : -2.731
## 3rd Qu.: 7.000
## Max. : 24.000
##
Line plot of all SOPS Total for each subj:
## Warning: Removed 1 row(s) containing missing values (geom_path).
Pulling “SIPS” ratings number of Pos, Neg, Gen, etc items on SOPS coded as 3,4,5, or 6 depending on suffix - calculating for subj that aren’t already input. Not planning to use these in final analyses, but already wrote the code.
# #filter df for SOPS item completion
# sips_scan <- dx_sips.sops_scan %>%
# drop_na(POS_345, POS_6, NEG_3456, GEN_3456) %>%
# mutate(POS_3456= rowSums(across(c(POS_345, POS_6)))) #concatenating across frequency of p sxs for consistency across subscales
sips_calc <- sips.sops_scan.filt %>%
drop_na(8:26)
#calc sips
sips_calc$sips_pos <- apply(sips_calc[8:12], 1, function(x) length(which(x>2)))
sips_calc$sips_neg <- apply(sips_calc[13:18], 1, function(x) length(which(x>2)))
sips_calc$sips_dis <- apply(sips_calc[19:22], 1, function(x) length(which(x>2)))
sips_calc$sips_gen <- apply(sips_calc[23:26], 1, function(x) length(which(x>2)))
sips_scan <- sips_calc %>% select(1, 7, 47:61)
#since dropping NA we need to refilter pre-post subj
##find subjects that have a baseline sops up to 364 days post-7T
clin364_sip2 <- sips_scan %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
##find subjects that have a followup sips sops >365 days post-7T
clin365_sip2 <- sips_scan %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
##filter df
sips_scan <- subset(sips_scan, (bblid %in% clin364_sip2$bblid & bblid %in% clin365_sip2$bblid))
length(unique(sips_scan$bblid))
## [1] 26
There are 26 (26) subjects with SIPS at baseline and at least 1 post-1 yr followup
Calculate change across time (i.e. change from one timepoint to the next):
#calculate change in SIPS scores, save and plot
sips_change <- sips_scan %>%
group_by(bblid) %>%
arrange(diff_in_days) %>%
mutate("pos_diff"= sips_pos - lag(sips_pos),
"neg_diff"= sips_neg - lag(sips_neg),
"gen_diff"= sips_gen -lag(sips_gen),
"dis_diff"= sips_dis - lag(sips_dis)) %>%
ungroup()
summary(sips_change)
## bblid DOSIPS dx dx_source
## 109577 : 6 Min. :2013-01-10 healthy :16 Length:73
## 99949 : 5 1st Qu.:2014-12-20 other_dx : 7 Class :character
## 105176 : 5 Median :2016-03-19 psych_spect:50 Mode :character
## 82051 : 4 Mean :2016-12-08
## 139272 : 4 3rd Qu.:2018-06-29
## 83835 : 3 Max. :2021-10-06
## (Other):46
## dx_date first_7t_date dx_diff Diagnostic_Group
## Min. :2013-11-14 Min. :2013-10-11 Min. :-259.00 Other : 6
## 1st Qu.:2014-08-27 1st Qu.:2014-07-16 1st Qu.: -71.00 PRO/CHR:46
## Median :2015-01-19 Median :2014-09-25 Median : -43.00 PSY : 5
## Mean :2015-05-22 Mean :2015-05-11 Mean : 11.53 TD/NC :16
## 3rd Qu.:2015-04-17 3rd Qu.:2015-08-06 3rd Qu.: 192.00
## Max. :2018-09-21 Max. :2018-11-30 Max. : 269.00
##
## Race sex ethnicity age_scan diff_in_days
## Asian : 4 M:30 1: 3 Min. :15.00 Min. :-304.0
## Black/African American:56 F:43 2:70 1st Qu.:16.00 1st Qu.: -56.0
## Mixed : 5 Median :19.00 Median : 387.0
## White : 8 Mean :19.55 Mean : 577.7
## 3rd Qu.:22.00 3rd Qu.:1000.0
## Max. :26.00 Max. :2677.0
##
## sips_pos sips_neg sips_dis sips_gen
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.000 Median :1.0000 Median :0.0000 Median :0.0000
## Mean :1.479 Mean :0.7808 Mean :0.4795 Mean :0.5068
## 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :5.000 Max. :4.0000 Max. :3.0000 Max. :3.0000
##
## pos_diff neg_diff gen_diff dis_diff
## Min. :-3.0000 Min. :-4.0000 Min. :-2.00000 Min. :-2.0000
## 1st Qu.:-1.0000 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.:-1.0000
## Median : 0.0000 Median : 0.0000 Median : 0.00000 Median : 0.0000
## Mean :-0.1702 Mean :-0.1915 Mean :-0.06383 Mean :-0.1702
## 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. : 4.0000 Max. : 1.0000 Max. : 1.00000 Max. : 2.0000
## NA's :26 NA's :26 NA's :26 NA's :26
#plot longitudinal scores
ggplot(sips_change, aes(x=diff_in_days, y=sips_pos, color=bblid)) +
geom_line() +
labs(title="SIPS Pos")
ggplot(sips_change, aes(x=diff_in_days, y=sips_neg, color=bblid)) +
geom_line() +
labs(title="SIPS Neg")
ggplot(sips_change, aes(x=diff_in_days, y=sips_gen, color=bblid)) +
geom_line() +
labs(title="SIPS Gen")
ggplot(sips_change, aes(x=diff_in_days, y=sips_dis, color=bblid)) +
geom_line() +
labs(title="SIPS Dis")
#plot change
ggplot(sips_change, aes(x=pos_diff)) +
geom_histogram(color="white", fill= "darkgreen") +
labs(title="Change in SIPS Pos")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).
ggplot(sips_change, aes(x=neg_diff)) +
geom_histogram(color="white", fill= "darkgreen") +
labs(title="Change in SIPS Neg")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).
ggplot(sips_change, aes(x=gen_diff)) +
geom_histogram(color="white", fill= "darkgreen") +
labs(title="Change in SIPS Gen")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).
Looking at just change immediately pre-post 7T
#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post <- sips_scan %>%
select(1, 2, 13:17) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
group_by(bblid) %>%
filter(diff_in_days >= 365) %>%
arrange(bblid, diff_in_days) %>%
slice(c(1))%>%
rename_with( ~ paste0(.x, "_post"))
#find datediff that's <365 and closest to 0
pre <- sips_scan %>%
group_by(bblid) %>%
filter(diff_in_days < 365) %>%
mutate(abs_diff_in_days=abs(diff_in_days)) %>%
arrange(bblid, abs_diff_in_days) %>%
slice(c(1)) %>%
select(!c(abs_diff_in_days)) %>%
rename_at(c(2, 13:17), ~ paste0(.x, "_pre"))
#new dataframe
proximal_sips <- inner_join(pre, post, by = c("bblid" = "bblid_post")) %>%
mutate(change_pos = (sips_pos_post - sips_pos_pre),
change_neg = (sips_neg_post - sips_neg_pre),
change_gen = (sips_gen_post - sips_gen_pre))
#plot change
ggplot(proximal_sips, aes(x=change_pos)) +
geom_bar(color="white", fill= "darkgreen") +
labs(title="Change in SIPS Pos")
ggplot(proximal_sips, aes(x=change_neg)) +
geom_bar(color="white", fill= "darkgreen") +
labs(title="Change in SIPS Neg")
ggplot(proximal_sips, aes(x=change_gen)) +
geom_bar(color="white", fill= "darkgreen") +
labs(title="Change in SIPS Gen")
Prime Screen scores were computed by totaling the number of items endorsed at the level of 5 or 6 and via raw sum of scores(SIP003 to SIP014) (per Zak’s paper and scale authors). Combining data from Kosha and David, keeping proband over collateral reports when one subject has both on the same day.
#add prime scores from david's spreadsheets
syrp_prime <- syrp %>%
select(5, 7, 10, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35) %>%
rename(sip003=prime_1, sip004=prime_2, sip005=prime_3, sip006=prime_4, sip007=prime_5, sip008=prime_6, sip009=prime_7, sip010=prime_8, sip011=prime_9, sip012=prime_10, sip013=prime_11, sip014=prime_12, source=admin_proband) %>% #rename to match redcap prime items
mutate(bblid=as.character(bblid)) #setting back to character for now for rbind
prime_s <- prime %>%
mutate(admin_proband = str_detect(redcapid, "P"),
admin_collat = str_detect(redcapid, "C"),
source = case_when(admin_proband == TRUE ~ "p",
admin_collat == TRUE ~ "c")) %>%
select(1, 5, 75, 8:19) %>% #subset columns
mutate(bblid=as.character(bblid)) #setting back to character for now for rbind
#combine both prime data sources
all_prime <- rbind(syrp_prime,prime_s) %>%
drop_na(4:15) %>% #drop rows with missing values
mutate (bblid=as.factor(bblid)) %>% #set bblid back to factor
distinct() #keep only unique rows
#check for multiple assessments per subj per day
id_date <- all_prime %>% select(bblid, date)
table(duplicated(id_date))
##
## FALSE TRUE
## 231 19
#select proband assessment if there are two proband & collateral assesments on one day
all_prime$source <- factor(all_prime$source, ordered = TRUE, levels=c("p", "c"))
prime_p <- all_prime %>%
group_by(bblid, date) %>% #select only the first instance of a specific bblid on a specific day ~ proband
arrange(source) %>%
slice(c(1)) %>%
ungroup()
#join with 7T dates & baseline dx
prime_date <- left_join(prime_p, dx_baseline, by = c("bblid" = "bblid"))
#calc diff (negative values -> prime before 7t)
prime_date$diff_in_days = as.numeric(difftime(prime_date$date, prime_date$first_7t_date, units = "days"))
# drop data more than 1 yr prior to scan
prime_dx <- subset(prime_date, diff_in_days >= -365)
str(prime_date$bblid) #no subj lost
## Factor w/ 140 levels "102041","105176",..: 1 1 1 2 2 2 3 3 3 4 ...
min(prime_date$diff_in_days) #confirm datediff filtered
## [1] NA
#find subjects that have a baseline sips and sops up to 364 days post-7T
clin364_pr <- prime_dx %>% subset(diff_in_days <= 364, select = bblid) %>% unique()
#find subjects that have a followup sips sops >365 days post-7T
clin365_pr <- prime_dx %>% subset(diff_in_days >= 365, select = bblid) %>% unique()
#filter df for appropriate timepoints
prime_scan <- subset(prime_dx, (bblid %in% clin365_pr$bblid & bblid %in% clin364_pr$bblid)) %>%
mutate(prime_sum = rowSums(across(4:15))) #calculate raw sum across items
#calculate prime_scores
prime_scan$count.5 <- apply(prime_scan[4:15], 1, function(x) length(which(x==5)))
prime_scan$count.6 <- apply(prime_scan[4:15], 1, function(x) length(which(x==6)))
prime_scan$prime_score <- rowSums(cbind(prime_scan$count.5, prime_scan$count.6))
#plot for fun
ggplot(prime_scan, aes(x=diff_in_days, y=bblid, color=prime_score)) +
geom_point() +
geom_vline(xintercept=365) +
labs(title="PRIME Score Relative to First 7T Scan", x="Days from 7T Scan")
length(unique(prime_scan$bblid))
## [1] 26
summary(prime_scan)
## bblid date source sip003 sip004
## 83835 : 5 Min. :2014-10-30 p :54 Min. :0.0 Min. :0.0000
## 117397 : 4 1st Qu.:2016-12-17 c : 0 1st Qu.:0.0 1st Qu.:0.0000
## 121407 : 4 Median :2018-04-02 NA's:16 Median :0.0 Median :0.0000
## 89279 : 4 Mean :2018-03-24 Mean :0.9 Mean :0.5429
## 109577 : 3 3rd Qu.:2019-03-15 3rd Qu.:1.0 3rd Qu.:0.0000
## 112126 : 3 Max. :2022-01-20 Max. :6.0 Max. :6.0000
## (Other):47
## sip005 sip006 sip007 sip008 sip009
## Min. :0.0 Min. :0.0000 Min. :0.0 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0 1st Qu.:0.0000 1st Qu.:0.0 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0 Median :0.0000 Median :0.0 Median :0.0000 Median :0.0000
## Mean :0.3 Mean :0.8571 Mean :0.9 Mean :0.4286 Mean :0.2714
## 3rd Qu.:0.0 3rd Qu.:1.0000 3rd Qu.:1.0 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :5.0 Max. :6.0000 Max. :6.0 Max. :6.0000 Max. :6.0000
##
## sip010 sip011 sip012 sip013
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.5143 Mean :0.6571 Mean :0.6143 Mean :0.5571
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :6.0000 Max. :5.0000 Max. :6.0000 Max. :6.0000
##
## sip014 dx dx_source dx_date
## Min. :0.0000 healthy :26 Length:70 Min. :2014-06-30
## 1st Qu.:0.0000 other_dx : 6 Class :character 1st Qu.:2015-01-19
## Median :0.0000 psych_spect:36 Mode :character Median :2015-04-29
## Mean :0.4286 NA's : 2 Mean :2016-04-11
## 3rd Qu.:0.0000 3rd Qu.:2018-02-16
## Max. :6.0000 Max. :2020-10-02
## NA's :2
## first_7t_date dx_diff Diagnostic_Group
## Min. :2014-03-07 Min. :-424.000 Other : 7
## 1st Qu.:2014-08-19 1st Qu.:-131.000 PRO/CHR:29
## Median :2015-08-11 Median : -50.500 PSY : 6
## Mean :2016-05-05 Mean : 8.765 TD/NC :28
## 3rd Qu.:2018-03-30 3rd Qu.: 193.500
## Max. :2019-05-10 Max. : 616.000
## NA's :2
## Race sex ethnicity age_scan diff_in_days
## Asian : 0 M:27 1: 2 Min. :15.00 Min. :-294.0
## Black/African American:52 F:43 2:68 1st Qu.:19.00 1st Qu.: 0.0
## Mixed : 2 Median :20.00 Median : 546.0
## White :16 Mean :20.24 Mean : 688.9
## 3rd Qu.:22.00 3rd Qu.:1265.2
## Max. :26.00 Max. :2746.0
##
## prime_sum count.5 count.6 prime_score
## Min. : 0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 1.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean : 6.971 Mean :0.2286 Mean :0.2857 Mean :0.5143
## 3rd Qu.: 9.750 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :47.000 Max. :3.0000 Max. :4.0000 Max. :4.0000
##
There are 26 (26) subjects with PRIME scores at baseline and at least 1 post-1 yr followup.
Look at just change immediately pre-post 7T. Check calculations for percent change make sense!
#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post_prime <- prime_scan %>%
select(1:15, 26:30) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
group_by(bblid) %>%
filter(diff_in_days >= 365) %>%
arrange(bblid, diff_in_days) %>%
slice(c(1))%>%
rename_with( ~ paste0(.x, "_post"))
#find datediff that's <365 and closest to 0
pre_prime <- prime_scan %>%
group_by(bblid) %>%
filter(diff_in_days < 365) %>%
mutate(abs_diff_in_days=abs(diff_in_days)) %>%
arrange(bblid, abs_diff_in_days) %>%
slice(c(1)) %>%
select(!c(abs_diff_in_days)) %>%
rename_at(c(2:15, 26:30), ~ paste0(.x, "_pre"))
#new dataframe
proximal_prime <- inner_join(pre_prime, post_prime, by = c("bblid" = "bblid_post")) %>%
mutate(change_prime_sum = (prime_sum_post - prime_sum_pre),#calculate change
change_prime = (prime_score_post - prime_score_pre),
p.change_prime_sum = ((change_prime_sum+1e-10)/(prime_sum_pre+1e-10))*100, #calculating percent change, adding .00000000001 to each to keep from dividing by zero
p.change_prime = ((change_prime+1e-10)/(prime_score_pre+1e-10))*100,
p.change_prime_sum = round(p.change_prime_sum, digits=1), #rounding to undo small addition
p.change_prime = round(p.change_prime, digits=1))
#calculating days between prime timepoints
proximal_prime$prime_date_diff = as.numeric(difftime(proximal_prime$date_post, proximal_prime$date_pre, units = "days"))
proximal_prime %>% select(dx, prime_sum_pre, prime_score_pre, prime_sum_post, prime_score_post, change_prime_sum, change_prime, p.change_prime_sum, p.change_prime, prime_date_diff) %>% summary()
## Adding missing grouping variables: `bblid`
## bblid dx prime_sum_pre prime_score_pre
## 109577 : 1 healthy : 9 Min. : 0.000 Min. :0.0000
## 112126 : 1 other_dx : 2 1st Qu.: 0.000 1st Qu.:0.0000
## 116019 : 1 psych_spect:14 Median : 3.500 Median :0.0000
## 116354 : 1 NA's : 1 Mean : 9.346 Mean :0.7308
## 117397 : 1 3rd Qu.:12.250 3rd Qu.:0.0000
## 120217 : 1 Max. :47.000 Max. :4.0000
## (Other):20
## prime_sum_post prime_score_post change_prime_sum change_prime
## Min. : 0.000 Min. :0.0000 Min. :-35.000 Min. :-4.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: -4.750 1st Qu.: 0.0000
## Median : 0.000 Median :0.0000 Median : -0.500 Median : 0.0000
## Mean : 5.962 Mean :0.4231 Mean : -3.385 Mean :-0.3077
## 3rd Qu.: 9.000 3rd Qu.:0.0000 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :32.000 Max. :4.0000 Max. : 12.000 Max. : 1.0000
##
## p.change_prime_sum p.change_prime prime_date_diff
## Min. :-1.000e+02 Min. :-100.0 Min. : 385
## 1st Qu.:-1.000e+02 1st Qu.: 100.0 1st Qu.: 729
## Median :-1.200e+01 Median : 100.0 Median :1087
## Mean : 6.538e+11 Mean : 65.7 Mean :1068
## 3rd Qu.: 1.000e+02 3rd Qu.: 100.0 3rd Qu.:1179
## Max. : 1.200e+13 Max. : 100.0 Max. :2458
##
#plots
ggplot(proximal_prime, aes(x=change_prime)) +
geom_histogram(color="white", fill= "darkgreen") +
labs(title="Change in PRIME Score")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(proximal_prime, aes(x=change_prime_sum)) +
geom_histogram(color="white", fill= "darkgreen") +
labs(title="Change in PRIME Sums")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Very little change in PRIME score (counts of 5 and 6 rated sxs), more variability across score sums. Negative change -> ppt’s score is getting lower (i.e. symptoms improving).
Plotting PRIME with dx ~ not all points visible due to overlap
#plot absolute change between proximal PRIME scores
ggplot(proximal_prime, aes(x=change_prime_sum, y=prime_sum_pre, color=dx)) +
geom_point() +
geom_text_repel(aes(label = bblid), max.overlaps = 30) +
labs(title="Change in PRIME Sums Over Time", x="Change (Post - Pre)", y="Baseline PRIME Total")
#plot absolute change vs time between proximal GAF Cs
ggplot(proximal_prime, aes(x=prime_date_diff, y=change_prime_sum, color=dx)) +
geom_point() +
geom_text_repel(aes(label = bblid), max.overlaps = 30) +
labs(title="Change in PRIME By Delta Time", x="Time Between PRIME Assessments (Days)", y="Change PRIME (Post - Pre)")
Looking through SIPS GAF current
## [1] 22
## [1] 0
There are 27 (22) subjects with GAFC at baseline and at least 1 post-1 yr followup. Looking at scores across time:
#plot longitudinal scores
ggplot(gafc_scan, aes(x=diff_in_days, y=GAF_C, color=bblid)) +
geom_line() +
labs(title="GAFC")
Look at absolute change from one timepoint to the next:
#calculate change in GAFC scores
gafc_change <- gafc_scan %>%
group_by(bblid) %>%
arrange(diff_in_days) %>%
mutate("gafc_diff"= GAF_C - lag(GAF_C))
summary(gafc_change)
## bblid DOSIPS GAF_C first_7t_date
## 109577 : 6 Min. :2013-01-10 Min. :21.00 Min. :2013-10-11
## 82051 : 4 1st Qu.:2014-12-19 1st Qu.:50.00 1st Qu.:2014-07-20
## 99949 : 4 Median :2015-11-14 Median :60.50 Median :2014-10-27
## 139272 : 4 Mean :2016-07-05 Mean :63.13 Mean :2015-05-26
## 85369 : 3 3rd Qu.:2018-03-11 3rd Qu.:78.25 3rd Qu.:2015-08-11
## 87225 : 3 Max. :2020-10-26 Max. :95.00 Max. :2018-11-30
## (Other):36
## diff_in_days dx dx_source dx_date
## Min. :-304.00 healthy : 8 Length:60 Min. :2013-11-14
## 1st Qu.: -61.25 other_dx : 6 Class :character 1st Qu.:2014-07-31
## Median : 259.50 psych_spect:46 Mode :character Median :2015-01-30
## Mean : 406.13 Mean :2015-05-15
## 3rd Qu.: 737.75 3rd Qu.:2015-06-12
## Max. :1648.00 Max. :2018-09-21
##
## dx_diff gafc_diff
## Min. :-259.00 Min. :-38.000
## 1st Qu.: -71.00 1st Qu.: -8.750
## Median : -50.00 Median : 0.500
## Mean : -11.03 Mean : 1.526
## 3rd Qu.: 34.00 3rd Qu.: 12.500
## Max. : 221.00 Max. : 35.000
## NA's :22
Negative change -> ppt’s score is getting lower (i.e. symptoms worstening). Look at absolute change immediately pre-post 7T
#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post_gafc <- gafc_scan %>%
select(1:3, 5) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
group_by(bblid) %>%
filter(diff_in_days >= 365) %>%
arrange(bblid, diff_in_days) %>%
slice(c(1))%>%
rename_with( ~ paste0(.x, "_post")) %>%
ungroup()
#find datediff that's <365 and closest to 0
pre_gafc <- gafc_scan %>%
group_by(bblid) %>%
filter(diff_in_days < 365) %>%
mutate(abs_diff_in_days=abs(diff_in_days)) %>%
arrange(bblid, abs_diff_in_days) %>%
slice(c(1)) %>%
select(!c(abs_diff_in_days)) %>%
rename_at(c(2,3,5), ~ paste0(.x, "_pre")) %>%
ungroup()
#new dataframe
proximal_gafc <- inner_join(pre_gafc, post_gafc, by = c("bblid" = "bblid_post")) %>%
mutate(change_gafc = (GAF_C_post - GAF_C_pre), #absolute change
p.change_gafc = (change_gafc/GAF_C_pre)*100) #percent change
#calculating days between gafc timepoints
proximal_gafc$gafc_date_diff = as.numeric(difftime(proximal_gafc$DOSIPS_post, proximal_gafc$DOSIPS_pre, units = "days"))
#plot change
ggplot(proximal_gafc, aes(x=change_gafc)) +
geom_bar(color="white", fill= "darkgreen") +
labs(title="Change in GAF Current Immediately Pre-Post 7T")
proximal_gafc %>% select(6, 3, 11, 13:15) %>% summary()
## dx GAF_C_pre GAF_C_post change_gafc
## healthy : 4 Min. :21.00 Min. :21.00 Min. :-38.000
## other_dx : 2 1st Qu.:48.50 1st Qu.:50.25 1st Qu.: -0.750
## psych_spect:16 Median :55.00 Median :69.00 Median : 4.000
## Mean :59.36 Mean :65.14 Mean : 5.773
## 3rd Qu.:76.00 3rd Qu.:80.50 3rd Qu.: 16.000
## Max. :90.00 Max. :95.00 Max. : 35.000
## p.change_gafc gafc_date_diff
## Min. :-43.243 Min. : 250.0
## 1st Qu.: -1.471 1st Qu.: 645.8
## Median : 7.131 Median : 879.0
## Mean : 13.739 Mean : 938.5
## 3rd Qu.: 28.508 3rd Qu.:1240.8
## Max. : 81.395 Max. :1698.0
Plotting GAF C with dx
#plot absolute change vs baseline GAF C
ggplot(proximal_gafc, aes(x=change_gafc, y=GAF_C_pre, color=dx)) +
geom_point() +
geom_text_repel(aes(label = bblid), max.overlaps = 30) +
labs(title="Change in Current GAF Over Time", x="Change (Post - Pre)", y="Baseline GAF C")
#plot absolute change vs time between proximal GAF Cs
ggplot(proximal_gafc, aes(x=gafc_date_diff, y=change_gafc, color=dx)) +
geom_point() +
geom_text_repel(aes(label = bblid), max.overlaps = 30) +
labs(title="Change in Current GAF By Delta Time", x="Time Between GAF Assessments (Days)", y="Change GAF (Post - Pre)")
Note: min value between GAF assessments is less than 1 yr (250 days). Will likely need to filter.
Looking at percent change in current GAF
#plot absolute change vs baseline GAF C
ggplot(proximal_gafc, aes(x=p.change_gafc, y=GAF_C_pre, color=dx)) +
geom_point() +
geom_text_repel(aes(label = bblid), max.overlaps = 30) +
labs(title="Percent Change in Current GAF Over Time", x="Percent Change (Post - Pre)", y="Baseline GAF C")
#plot absolute change vs time between proximal GAF Cs
ggplot(proximal_gafc, aes(x=gafc_date_diff, y=p.change_gafc, color=dx)) +
geom_point() +
geom_text_repel(aes(label = bblid), max.overlaps = 30) +
labs(title="Percent Change in Current GAF By Delta Time", x="Time Between GAF Assessments (Days)", y="Percent Change GAF (Post - Pre)")
Analysis ideas - look at improvers vs worseners? how does that this break down by diagnosis
Looking at GASR data included in diagnosis_long.csv
#drop missing gasr
gasr <- redcap %>% drop_na(GASR_CURRENT)%>%
mutate(BBLID = as.factor(BBLID)) %>%
select(1,2,6)
#join with 7T dates
gasr_date <- inner_join(gasr, dx_baseline, by = c("BBLID" = "bblid"))
#calc diff (negative values -> gasr before 7t)
gasr_date$diff_in_days = as.numeric(difftime(gasr_date$DODIAGNOSIS, gasr_date$first_7t_date, units = "days"))
# drop data more than 1 yr prior to scan
gasr_dx <- subset(gasr_date, diff_in_days >= -365)
min(gasr_dx$diff_in_days) #confirm datediff filtered
## [1] -354
#find subjects that have a baseline sips and sops up to 364 days post-7T
clin364_gasr <- gasr_dx %>% subset(diff_in_days <= 364, select = BBLID) %>% unique()
#find subjects that have a followup sips sops >365 days post-7T
clin365_gasr <- gasr_dx %>% subset(diff_in_days >= 365, select = BBLID) %>% unique()
#filter df for appropriate timepoints
gasr_scan <- subset(gasr_dx, (BBLID %in% clin365_gasr$BBLID & BBLID %in% clin364_gasr$BBLID))
#plot for fun
ggplot(gasr_scan, aes(x=diff_in_days, y=BBLID, color=GASR_CURRENT)) +
geom_point() +
geom_vline(xintercept=365) +
labs(title="GASR Current Relative to First 7T Scan", x="Days from 7T Scan")
length(unique(gasr_scan$BBLID))
## [1] 28
summary(gasr_scan)
## BBLID DODIAGNOSIS GASR_CURRENT dx
## 105176 : 5 Min. :2013-06-09 Min. :21.00 healthy :17
## 83835 : 4 1st Qu.:2015-01-30 1st Qu.:50.00 other_dx : 7
## 109577 : 4 Median :2018-01-10 Median :63.00 psych_spect:49
## 121407 : 4 Mean :2017-04-18 Mean :64.14
## 139272 : 4 3rd Qu.:2018-07-30 3rd Qu.:78.00
## 89367 : 3 Max. :2021-10-06 Max. :95.00
## (Other):49
## dx_source dx_date first_7t_date
## Length:73 Min. :2014-01-08 Min. :2014-03-07
## Class :character 1st Qu.:2014-10-30 1st Qu.:2014-07-22
## Mode :character Median :2015-01-19 Median :2014-11-07
## Mean :2015-08-10 Mean :2015-08-03
## 3rd Qu.:2015-06-16 3rd Qu.:2015-08-26
## Max. :2018-09-21 Max. :2018-11-30
##
## dx_diff Diagnostic_Group Race sex
## Min. :-259.000 Other : 9 Asian : 2 M:31
## 1st Qu.: -71.000 PRO/CHR:42 Black/African American:56 F:42
## Median : -50.000 PSY : 5 Mixed : 5
## Mean : 6.863 TD/NC :17 White :10
## 3rd Qu.: 120.000
## Max. : 269.000
##
## ethnicity age_scan diff_in_days
## 1: 3 Min. :15.00 Min. :-304.0
## 2:70 1st Qu.:17.00 1st Qu.: -55.0
## Median :20.00 Median : 515.0
## Mean :19.58 Mean : 623.6
## 3rd Qu.:21.00 3rd Qu.:1182.0
## Max. :26.00 Max. :2677.0
##
There are 28 (28) unique subjects with GASR current scores w/in 1 yr of 7T and after 1 yr follow-up. Looking at scores across time:
#plot longitudinal scores
ggplot(gasr_scan, aes(x=diff_in_days, y=GASR_CURRENT, color=BBLID)) +
geom_line() +
labs(title="GASR")
Look at absolute change from one timepoint to the next:
#calculate change in GAFC scores
gasr_change <- gasr_scan %>%
group_by(BBLID) %>%
arrange(diff_in_days) %>%
mutate("gasr_diff"= GASR_CURRENT - lag(GASR_CURRENT))
summary(gasr_change)
## BBLID DODIAGNOSIS GASR_CURRENT dx
## 105176 : 5 Min. :2013-06-09 Min. :21.00 healthy :17
## 83835 : 4 1st Qu.:2015-01-30 1st Qu.:50.00 other_dx : 7
## 109577 : 4 Median :2018-01-10 Median :63.00 psych_spect:49
## 121407 : 4 Mean :2017-04-18 Mean :64.14
## 139272 : 4 3rd Qu.:2018-07-30 3rd Qu.:78.00
## 89367 : 3 Max. :2021-10-06 Max. :95.00
## (Other):49
## dx_source dx_date first_7t_date
## Length:73 Min. :2014-01-08 Min. :2014-03-07
## Class :character 1st Qu.:2014-10-30 1st Qu.:2014-07-22
## Mode :character Median :2015-01-19 Median :2014-11-07
## Mean :2015-08-10 Mean :2015-08-03
## 3rd Qu.:2015-06-16 3rd Qu.:2015-08-26
## Max. :2018-09-21 Max. :2018-11-30
##
## dx_diff Diagnostic_Group Race sex
## Min. :-259.000 Other : 9 Asian : 2 M:31
## 1st Qu.: -71.000 PRO/CHR:42 Black/African American:56 F:42
## Median : -50.000 PSY : 5 Mixed : 5
## Mean : 6.863 TD/NC :17 White :10
## 3rd Qu.: 120.000
## Max. : 269.000
##
## ethnicity age_scan diff_in_days gasr_diff
## 1: 3 Min. :15.00 Min. :-304.0 Min. :-38.0000
## 2:70 1st Qu.:17.00 1st Qu.: -55.0 1st Qu.:-10.0000
## Median :20.00 Median : 515.0 Median : 0.0000
## Mean :19.58 Mean : 623.6 Mean : -0.5333
## 3rd Qu.:21.00 3rd Qu.:1182.0 3rd Qu.: 7.0000
## Max. :26.00 Max. :2677.0 Max. : 35.0000
## NA's :28
Negative change -> ppt’s score is getting lower (i.e. symptoms worstening). Look at absolute change immediately pre-post 7T
#define baseline - scan closest to 7T
#find smallest datediff that's >=365
post_gasr <- gasr_scan %>%
select(1:3, 14) %>% #filter so we're not duplicating rows that wont change (demographics, 7t date, etc.)
group_by(BBLID) %>%
filter(diff_in_days >= 365) %>%
arrange(BBLID, diff_in_days) %>%
slice(c(1))%>%
rename_with( ~ paste0(.x, "_post")) %>%
ungroup()
#find datediff that's <365 and closest to 0
pre_gasr <- gasr_scan %>%
group_by(BBLID) %>%
filter(diff_in_days < 365) %>%
mutate(abs_diff_in_days=abs(diff_in_days)) %>%
arrange(BBLID, abs_diff_in_days) %>%
slice(c(1)) %>%
select(!c(abs_diff_in_days)) %>%
rename_at(c(2,3,14), ~ paste0(.x, "_pre")) %>%
ungroup()
#new dataframe
proximal_gasr <- inner_join(pre_gasr, post_gasr, by = c("BBLID" = "BBLID_post")) %>%
mutate(change_gasr = (GASR_CURRENT_post - GASR_CURRENT_pre), #absolute change
p.change_gasr = (change_gasr/GASR_CURRENT_pre)*100) #percent change
#calculating days between gafc timepoints
proximal_gasr$gasr_date_diff = as.numeric(difftime(proximal_gasr$DODIAGNOSIS_post, proximal_gasr$DODIAGNOSIS_pre, units = "days"))
#plot change
ggplot(proximal_gasr, aes(x=change_gasr)) +
geom_bar(color="white", fill= "darkgreen") +
labs(title="Change in GASR Current Immediately Pre-Post 7T")
summary(proximal_gasr)
## BBLID DODIAGNOSIS_pre GASR_CURRENT_pre dx
## 18093 : 1 Min. :2014-01-08 Min. :21.00 healthy : 6
## 82051 : 1 1st Qu.:2014-11-14 1st Qu.:49.50 other_dx : 3
## 83835 : 1 Median :2015-02-18 Median :56.00 psych_spect:19
## 85369 : 1 Mean :2015-10-13 Mean :61.46
## 87225 : 1 3rd Qu.:2016-05-28 3rd Qu.:80.00
## 88760 : 1 Max. :2018-09-21 Max. :90.00
## (Other):22
## dx_source dx_date first_7t_date dx_diff
## Length:28 Min. :2014-01-08 Min. :2014-03-07 Min. :-259.00
## Class :character 1st Qu.:2014-11-14 1st Qu.:2014-07-22 1st Qu.: -90.25
## Mode :character Median :2015-02-18 Median :2015-01-21 Median : -53.00
## Mean :2015-10-13 Mean :2015-10-25 Mean : -12.29
## 3rd Qu.:2016-05-28 3rd Qu.:2016-08-19 3rd Qu.: 45.75
## Max. :2018-09-21 Max. :2018-11-30 Max. : 269.00
##
## Diagnostic_Group Race sex ethnicity age_scan
## Other : 3 Asian : 1 M:12 1: 1 Min. :15.00
## PRO/CHR:17 Black/African American:21 F:16 2:27 1st Qu.:17.00
## PSY : 2 Mixed : 2 Median :20.00
## TD/NC : 6 White : 4 Mean :19.68
## 3rd Qu.:21.25
## Max. :26.00
##
## diff_in_days_pre DODIAGNOSIS_post GASR_CURRENT_post diff_in_days_post
## Min. :-259.00 Min. :2015-11-08 Min. :21.00 Min. : 387.0
## 1st Qu.: -90.25 1st Qu.:2017-12-30 1st Qu.:52.50 1st Qu.: 648.2
## Median : -53.00 Median :2018-04-29 Median :65.50 Median : 934.5
## Mean : -12.29 Mean :2018-06-27 Mean :64.25 Mean : 976.6
## 3rd Qu.: 45.75 3rd Qu.:2019-05-24 3rd Qu.:78.00 3rd Qu.:1280.0
## Max. : 269.00 Max. :2021-05-13 Max. :90.00 Max. :2489.0
##
## change_gasr p.change_gasr gasr_date_diff
## Min. :-38.000 Min. :-43.243 Min. : 385.0
## 1st Qu.: -3.000 1st Qu.: -4.824 1st Qu.: 651.0
## Median : 1.500 Median : 2.136 Median :1079.5
## Mean : 2.786 Mean : 9.388 Mean : 988.9
## 3rd Qu.: 13.750 3rd Qu.: 26.434 3rd Qu.:1255.5
## Max. : 35.000 Max. : 81.395 Max. :2220.0
##
Plotting GASR with dx
#plot absolute change vs baseline GASR
ggplot(proximal_gasr, aes(x=change_gasr, y=GASR_CURRENT_pre, color=dx)) +
geom_point() +
geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
labs(title="Change in GASR Over Time", x="Change (Post - Pre)", y="Baseline GASR")
#plot absolute change vs time between proximal GASR
ggplot(proximal_gasr, aes(x=gasr_date_diff, y=change_gasr, color=dx)) +
geom_point() +
geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
labs(title="Change in GASR By Delta Time", x="Time Between GASR Assessments (Days)", y="Change GASR (Post - Pre)")
Note: min value between GASR assessments is less than 1 yr (250 days). Will likely need to filter.
Looking at percent change in GASR
#plot absolute change vs baseline GASR
ggplot(proximal_gasr, aes(x=p.change_gasr, y=GASR_CURRENT_pre, color=dx)) +
geom_point() +
geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
labs(title="Percent Change in GASR Over Time", x="Percent Change (Post - Pre)", y="Baseline GASR")
#plot absolute change vs time between proximal GASR
ggplot(proximal_gasr, aes(x=gasr_date_diff, y=p.change_gasr, color=dx)) +
geom_point() +
geom_text_repel(aes(label = BBLID), max.overlaps = 30) +
labs(title="Percent Change in GASR By Delta Time", x="Time Between GASR Assessments (Days)", y="Percent Change GASR (Post - Pre)")
Make one big clinical dataset that can be subsectioned to look at different constructs but will also easily show timepoints of subjects (eg on this date, ppt X had diagnosis, SIPS and PRIME). Merge additional clinical info (eg added by David) into our sample to potentially use as a predictor
Merging using dx_scan_date as our base, since it contains the 7t list & all diagnoses with 22q subj removed. * leave SYRP for 2nd to last since it has overlapping assessments with other dfs from Kosha (will need to re-rename prime cols) * merge demo & dx_scan_date last and not by date so info gets applied to ALL timepoints * calc diff_assess from 7t to date_assess
#list all subj so we can make sure all data is 7t subj with no 22q ~ then we can just use full joins
all_subj <- unique(dx_scan_date$bblid)
#clean cgas
summary(cgas)
## redcapid bblid protocol_number date
## Length:93 Min. : 14528 Length:93 Min. :2017-12-19
## Class :character 1st Qu.: 87646 Class :character 1st Qu.:2018-05-12
## Mode :character Median : 92155 Mode :character Median :2019-01-29
## Mean : 86345 Mean :2019-09-11
## 3rd Qu.:105176 3rd Qu.:2021-02-25
## Max. :139272 Max. :2022-01-20
##
## cgas cgas_gaf001 cgas_gaf002 cgas_gaf003
## Min. :0.0000 Min. :45.00 Min. :45.00 Min. :60.00
## 1st Qu.:0.5000 1st Qu.:71.00 1st Qu.:51.00 1st Qu.:66.00
## Median :1.0000 Median :75.00 Median :70.00 Median :69.00
## Mean :0.7333 Mean :75.78 Mean :68.67 Mean :72.75
## 3rd Qu.:1.0000 3rd Qu.:78.00 3rd Qu.:76.00 3rd Qu.:75.75
## Max. :1.0000 Max. :99.00 Max. :99.00 Max. :93.00
## NA's :78 NA's :84 NA's :84 NA's :89
## cgas_review cgas_review_assessor cgas_review_date cgas_review_notes
## Min. :0.0000 Length:93 Length:93 Length:93
## 1st Qu.:0.0000 Class :character Class :character Class :character
## Median :0.0000 Mode :character Mode :character Mode :character
## Mean :0.1111
## 3rd Qu.:0.0000
## Max. :1.0000
## NA's :84
## cgas_review_assign cgas_review_status cgas_review_complete_date
## Length:93 Min. :3 Length:93
## Class :character 1st Qu.:3 Class :character
## Mode :character Median :3 Mode :character
## Mean :3
## 3rd Qu.:3
## Max. :3
## NA's :92
## child_global_assessment_scale_cgas_complete admin_markup_redcapid_issue_ignore
## Min. :0.0000 Mode:logical
## 1st Qu.:0.0000 NA's:93
## Median :0.0000
## Mean :0.2366
## 3rd Qu.:0.0000
## Max. :2.0000
##
## redcapid_check imp_redcapid
## Mode:logical Mode:logical
## NA's:93 NA's:93
##
##
##
##
##
cgas_clean <- cgas %>%
mutate(bblid=as.factor(bblid),
date_assess=as.Date(date, format="%Y-%m-%d")) %>% #clean values
select(2, 5:8,20) %>% # select cols
subset(bblid %in% all_subj) #make sure correct subj list
#now sips clean & merge
sips_clean <- sips_calc %>% select(1, 7:27, 58:61) %>% #sips_calc already filtered for correct source (proband>collateral) for each timepoint, frequency calculated
rename(date_assess = DOSIPS) %>%
subset(bblid %in% all_subj) #make sure correct subj list
#merge
merge_1 <- full_join(sips_clean, cgas_clean, by = c("bblid", "date_assess"))
#prime clean and merge (will add rest of syrp later)
names(prime_p) #version selected for correct source (proband>collateral) for each timepoint, merged prime.csv and syrp primes
## [1] "bblid" "date" "source" "sip003" "sip004" "sip005" "sip006" "sip007"
## [9] "sip008" "sip009" "sip010" "sip011" "sip012" "sip013" "sip014"
prime_clean <- prime_p %>% rename(date_assess = date, prime_source=source)%>%
subset(bblid %in% all_subj) #make sure correct subj list
#calculate prime_scores
prime_clean$count.5 <- apply(prime_clean[4:15], 1, function(x) length(which(x==5)))
prime_clean$count.6 <- apply(prime_clean[4:15], 1, function(x) length(which(x==6)))
prime_clean$prime_score <- rowSums(cbind(prime_clean$count.5, prime_clean$count.6))
prime_clean <- prime_clean %>% select(-count.5, -count.6) #remove count.5 and count.6
#merge
merge_2 <- full_join(merge_1, prime_clean, by = c("bblid", "date_assess"))
#clean dx and merge
all_dx_clean <- all_dx %>% rename(date_assess=dx_date) %>%
mutate(bblid = as.factor(bblid)) %>%
subset(bblid %in% all_subj) #make sure correct subj list
#merge
merge_3 <-full_join(merge_2, all_dx_clean, by = c("bblid", "date_assess"))
#clean syrp and merge ~ several proband assessments on same day, how to filter by admin_event?
syrp_clean <- syrp %>% rename(date_assess=date) %>%
select(!c(13:37)) %>% #drop PRIME ~ maybe clean this up more later as well, lots of columns!
subset(bblid %in% all_subj) #make sure correct subj list
#merge
merge_4 <- full_join(merge_3, syrp_clean, by = c("bblid", "date_assess"))
length(unique(merge_4$bblid))
## [1] 130
#merge demographics and first scan date since they'll get applied to all clinical timepoints
first7t <- first7t %>% mutate(BBLID=as.factor(BBLID))
merge_demo.scan <- full_join(demo, first7t, by = c("BBLID")) %>%
subset(BBLID %in% all_subj) #make sure correct subj list
#merge demo & first scan into clinical data
all_data <- full_join(merge_demo.scan, merge_4, by = c("BBLID" = "bblid"))
length(unique(all_data$BBLID)) #should always have 132 subj
## [1] 132
#also making a version withough sryp because it has multiple proband assessments per day/complicates
all_data_nosyrp <- full_join(merge_demo.scan, merge_3, by = c("BBLID" = "bblid"))
length(unique(all_data_nosyrp$BBLID)) #should always have 132 subj
## [1] 132
#calculate datediff and filter for >1 yr pre-7t
all_data_nosyrp$date_diff = as.numeric(difftime(all_data_nosyrp$date_assess, all_data_nosyrp$first_7t_date, units = "days"))
all_data_nosyrp <- all_data_nosyrp %>% subset(date_diff >= -365)
length(unique(all_data_nosyrp$BBLID))
## [1] 122
There are 122 (122) subjects who have any kind of potentially usable clinical data (e.g. not from >1 yr prior to scan). `
There are 24 (24) subj who have 7T scans but are missing diagnoses from redcap/PNC. Writing to csv alongside info on the scan protocol and clinical data David added to Box (syrp spreadsheets) to try to figure out where these individuals are from/where more clinical data could be found.
#list missing dx ppts
dx_missing_info <- dx_baseline %>% subset(is.na(dx)) %>%
select(!c(3,4,6,8:10))
#merge in syrp data to see if david has other clinical info
extra_redcap <- syrp[1:11]
dx_missing_info <- left_join(dx_missing_info, extra_redcap, by="bblid") %>%
mutate(protocol_num = as.factor(protocol_numbe),
redcap_id = as.factor(redcap_id),
protocol = as.factor(protocol),
libi_id = as.factor(libi_id),
location = as.factor(location),
admin_proband = as.factor(admin_proband),
admin_event = as.factor(admin_event))
summary(dx_missing_info)
## bblid dx first_7t_date Diagnostic_Group
## 19783 : 5 healthy : 0 Min. :2014-05-08 Other : 9
## 19971 : 5 other_dx : 0 1st Qu.:2018-07-09 PRO/CHR: 0
## 19974 : 5 psych_spect: 0 Median :2019-04-12 PSY : 3
## 19790 : 4 NA's :75 Mean :2018-11-28 TD/NC :41
## 19798 : 4 3rd Qu.:2019-05-06 NA's :22
## 20523 : 4 Max. :2019-06-19
## (Other):48
## age_scan redcap_id admin_time_stamp protocol
## Min. :18.00 24 : 1 Length:75 7Tdep_pilot:31
## 1st Qu.:19.00 25 : 1 Class :character NARSAD :21
## Median :21.00 37 : 1 Mode :character SYRP : 6
## Mean :21.74 38 : 1 URF :15
## 3rd Qu.:23.00 39 : 1 NA's : 2
## Max. :30.00 (Other):68
## NA's :17 NA's : 2
## protocol_numbe libi_id date assessor
## Min. :818621 NA's:75 Min. :2018-02-28 Length:75
## 1st Qu.:825834 1st Qu.:2018-06-25 Class :character
## Median :825940 Median :2019-04-15 Mode :character
## Mean :825981 Mean :2018-12-30
## 3rd Qu.:825940 3rd Qu.:2019-05-03
## Max. :828621 Max. :2019-06-19
## NA's :2 NA's :2
## location admin_proband admin_event protocol_num
## 3535 Market: 1 p :73 postscan:23 818621: 5
## Gates 10 :46 NA's: 2 prescan :28 825834:31
## HUP6 : 4 selfr_1 :22 825940:21
## SC7T :22 NA's : 2 828621:16
## NA's : 2 NA's : 2
##
##
write.csv(dx_missing_info, "missing_dx.csv", row.names=FALSE, na="")